feat: lower repartition_file_min_size default from 10 MiB to 1 MiB#22439
feat: lower repartition_file_min_size default from 10 MiB to 1 MiB#22439adriangb wants to merge 1 commit into
Conversation
`repartition_file_min_size` gates how aggressively `repartitioned()` splits file groups by byte range to fan a scan out across `target_partitions` worth of cores. At 10 MiB the default leaves several SF1-sized dimension tables (TPC-H `part` ≈ 24 MiB, TPC-DS `customer_address` ≈ 7 MiB, …) on a single partition, so any CPU-bound per-batch work in the scan (filter eval, dictionary expansion, etc.) is single-threaded even when the cluster has plenty of idle cores. At 1 MiB those same files split cleanly into `target_partitions` byte ranges, e.g. TPC-H Q22 drops from 30 ms → 17 ms (~1.75× faster) on a 12-core SF1 run by parallelising the `part_with_promo` filter. The cost — more `open()` calls, more metadata loads — is small (10 vs 1 extra opens per file in the worst case, each amortised over the row-group / page-index reads) and the existing knob is still available for workloads where it matters. The csv_files.slt reset is switched from `SET ... = 10485760` to `RESET ...` so the test continues to round-trip the configured default regardless of what that default is.
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Summary
repartition_file_min_sizegates how aggressivelyrepartitioned()splits file groups by byte range to fan a scan out acrosstarget_partitionsworth of cores. At 10 MiB the default leaves several SF1-sized dimension tables (TPC-H `part` ≈ 24 MiB, TPC-DS `customer_address` ≈ 7 MiB, …) on a single partition, so any CPU-bound per-batch work in the scan (filter eval, dictionary expansion, etc.) is single-threaded even when the cluster has plenty of idle cores.At 1 MiB those same files split cleanly into `target_partitions` byte ranges. The cost (more `open()` calls, more metadata loads) is small in absolute terms (≤10 extra opens per file in the worst case, each amortised over the row-group / page-index reads) and the existing knob is still available for workloads where it matters.
Benchmark numbers
12-core, SF1, with the existing dynamic-filter-pushdown defaults preserved:
Test plan